Quiz: Goals and Rewards

So far, you've seen one example for how to frame an agent's goal as the maximization of expected cumulative reward. In this quiz, you will investigate several more examples.

Escape the Maze

Consider an agent who would like to learn to escape a maze. Which reward signals will encourage the agent to escape the maze as quickly as possible? Select all that apply.

The reward is -1 for every time step that the agent spends inside the maze. Once the agent escapes, the episode terminates.

The reward is +1 for every time step that the agent spends inside the maze. Once the agent escapes, the episode terminates.

The reward is -1 for every time step that the agent spends inside the maze. Once the agent escapes, it receives a reward of +10, and the episode terminates.

The reward is 0 for every time step that the agent spends inside the maze. Once the agent escapes, it receives a reward of +1, and the episode terminates.

SOLUTION:

The reward is -1 for every time step that the agent spends inside the maze. Once the agent escapes, the episode terminates.
The reward is -1 for every time step that the agent spends inside the maze. Once the agent escapes, it receives a reward of +10, and the episode terminates.

Consider an agent who would like to learn to play a board game (like backgammon, chess, or checkers). Which reward signals will encourage the agent to win the game? Select all that apply.

The agent receives a reward only at the end of the game, and receives a reward of +1 if it wins, -1 if it loses, and 0 if the game is a draw.

The agent receives a reward of -1 for every time step that it is still playing the game; once the game ends, the episode terminates.

The agent receives a reward only at the end of the game, and receives a reward of -1 if it wins, +1 if it loses, and 0 if the game is a draw.

The agent receives a reward only at the end of the game, and receives a reward of +10 if it wins, -10 if it loses, and 0 if the game is a draw.

SOLUTION:

The agent receives a reward only at the end of the game, and receives a reward of +1 if it wins, -1 if it loses, and 0 if the game is a draw.
The agent receives a reward only at the end of the game, and receives a reward of +10 if it wins, -10 if it loses, and 0 if the game is a draw.

Source: http://i.dailymail.co.uk/i/pix/2013/02/14/article-2278590-1792E332000005DC-394_634x615.jpg

SOLUTION:

The reward is +1 for every time step that the agent keeps the plate balanced on her head. If the plate falls, the episode terminates.